29 research outputs found

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Image Fusion via Sparse Regularization with Non-Convex Penalties

    Full text link
    The L1 norm regularized least squares method is often used for finding sparse approximate solutions and is widely used in 1-D signal restoration. Basis pursuit denoising (BPD) performs noise reduction in this way. However, the shortcoming of using L1 norm regularization is the underestimation of the true solution. Recently, a class of non-convex penalties have been proposed to improve this situation. This kind of penalty function is non-convex itself, but preserves the convexity property of the whole cost function. This approach has been confirmed to offer good performance in 1-D signal denoising. This paper demonstrates the aforementioned method to 2-D signals (images) and applies it to multisensor image fusion. The problem is posed as an inverse one and a corresponding cost function is judiciously designed to include two data attachment terms. The whole cost function is proved to be convex upon suitably choosing the non-convex penalty, so that the cost function minimization can be tackled by convex optimization approaches, which comprise simple computations. The performance of the proposed method is benchmarked against a number of state-of-the-art image fusion techniques and superior performance is demonstrated both visually and in terms of various assessment measures

    Object recognition in atmospheric turbulence scenes

    Full text link
    The influence of atmospheric turbulence on acquired surveillance imagery poses significant challenges in image interpretation and scene analysis. Conventional approaches for target classification and tracking are less effective under such conditions. While deep-learning-based object detection methods have shown great success in normal conditions, they cannot be directly applied to atmospheric turbulence sequences. In this paper, we propose a novel framework that learns distorted features to detect and classify object types in turbulent environments. Specifically, we utilise deformable convolutions to handle spatial turbulent displacement. Features are extracted using a feature pyramid network, and Faster R-CNN is employed as the object detector. Experimental results on a synthetic VOC dataset demonstrate that the proposed framework outperforms the benchmark with a mean Average Precision (mAP) score exceeding 30%. Additionally, subjective results on real data show significant improvement in performance

    Towards a Robust Framework for NeRF Evaluation

    Full text link
    Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR, SSIM, LPIPS etc.) only provide approximate indicators of performance since they generalise the ability of the entire NeRF pipeline. Hence, in this paper, we propose a new test framework which isolates the neural rendering network from the NeRF pipeline and then performs a parametric evaluation by training and evaluating the NeRF on an explicit radiance field representation. We also introduce a configurable approach for generating representations specifically for evaluation purposes. This employs ray-casting to transform mesh models into explicit NeRF samples, as well as to "shade" these representations. Combining these two approaches, we demonstrate how different "tasks" (scenes with different visual effects or learning strategies) and types of networks (NeRFs and depth-wise implicit neural representations (INRs)) can be evaluated within this framework. Additionally, we propose a novel metric to measure task complexity of the framework which accounts for the visual parameters and the distribution of the spatial data. Our approach offers the potential to create a comparative objective evaluation framework for NeRF methods.Comment: 9 pages, 4 experiment

    Unsupervised Image Fusion Using Deep Image Priors

    Get PDF
    A significant number of researchers have applied deep learning methods to image fusion. However, most works require a large amount of training data or depend on pre-trained models or frameworks to capture features from source images. This is inevitably hampered by a shortage of training data or a mismatch between the framework and the actual problem. Deep Image Prior (DIP) has been introduced to exploit convolutional neural networks' ability to synthesize the 'prior' in the input image. However, the original design of DIP is hard to be generalized to multi-image processing problems, particularly for image fusion. Therefore, we propose a new image fusion technique that extends DIP to fusion tasks formulated as inverse problems. Additionally, we apply a multi-channel approach to enhance DIP's effect further. The evaluation is conducted with several commonly used image fusion assessment metrics. The results are compared with state-of-the-art image fusion methods. Our method outperforms these techniques for a range of metrics. In particular, it is shown to provide the best objective results for most metrics when applied to medical images

    Optimal Transport-based Graph Matching for 3D retinal OCT image registration

    Get PDF
    Registration of longitudinal optical coherence tomography (OCT) images assists disease monitoring and is essential in image fusion applications. Mouse retinal OCT images are often collected for longitudinal study of eye disease models such as uveitis, but their quality is often poor compared with human imaging. This paper presents a novel but efficient framework involving an optimal transport based graph matching (OT-GM) method for 3D mouse OCT image registration. We first perform registration of fundus-like images obtained by projecting all b-scans of a volume on a plane orthogonal to them, hereafter referred to as the x-y plane. We introduce Adaptive Weighted Vessel Graph Descriptors (AWVGD) and 3D Cube Descriptors (CD) to identify the correspondence between nodes of graphs extracted from segmented vessels within the OCT projection images. The AWVGD comprises scaling, translation and rotation, which are computationally efficient, whereas CD exploits 3D spatial and frequency domain information. The OT-GM method subsequently performs the correct alignment in the x-y plane. Finally, registration along the direction orthogonal to the x-y plane (the z-direction) is guided by the segmentation of two important anatomical features peculiar to mouse b-scans, the Internal Limiting Membrane (ILM) and the hyaloid remnant (HR). Both subjective and objective evaluation results demonstrate that our framework outperforms other well-established methods on mouse OCT images within a reasonable execution time
    corecore